Bottom Up Exploration

This notebook documents the bottom-up strategy exploration to determine notebook similarity. It is based on the notion that it is easier to aggregate than to break down a 'black box.'

The biggest challenge is working with the AST structure. Because it is a tree, we need to merge leafs with their parents, working our way up.

GOAL

The Goal is to come up with a similarity function for entire notebooks


In [1]:
# Necessary imports 
import os
import time
from nbminer.notebook_miner import NotebookMiner
from nbminer.cells.cells import Cell
from nbminer.features.ast_features import ASTFeatures
from nbminer.stats.summary import Summary
from nbminer.stats.multiple_summary import MultipleSummary

In [2]:
#Loading in the notebooks
people = os.listdir('../testbed/Final')
notebooks = []
for person in people:
    person = os.path.join('../testbed/Final', person)
    if os.path.isdir(person):
        direc = os.listdir(person)
        notebooks.extend([os.path.join(person, filename) for filename in direc if filename.endswith('.ipynb')])
notebook_objs = [NotebookMiner(file) for file in notebooks]
a = ASTFeatures(notebook_objs)

In [3]:
# For each notebook, break notebook up into top level AST nodes
for i, nb in enumerate(a.nb_features):
    a.nb_features[i] = nb.get_new_notebook()

In [4]:
import networkx
from collections import deque
import ast

In [5]:
# Function that returns a networkx graph from a top level AST node
def return_graph(node):
    dgraph = networkx.DiGraph()
    nodes = deque()
    nodes.append(node.body[0])
    dgraph.add_node(node.body[0])
    while len(nodes) != 0:
        cur_node = nodes.pop()
        for node in ast.iter_child_nodes(cur_node):
            dgraph.add_node(node)
            dgraph.add_edge(cur_node,node)
            nodes.append(node)
    return dgraph

# Function that returns a list of these graph for all nodes in all notebooks
def return_all_graphs(a):
    graphs = []
    roots = []
    cells = []
    for i, nb in enumerate(a.nb_features):
        for cell in nb.get_all_cells():
            graphs.append(return_graph(cell.get_feature('ast')))
            roots.append(cell.get_feature('ast').body[0])
            cells.append(cell)
    return graphs, roots, cells
# Call to retrieve all graphs
all_graphs, all_roots, all_cells = return_all_graphs(a)

In [6]:
len(all_graphs)


Out[6]:
19882

Size of the AST trees

To look at the size of the AST trees, I'll create a histogram using the max shortest path from the root node to another node in the graph


In [7]:
max_values = []
for n in range(len(all_graphs)):
    max_values.append( max(networkx.shortest_path_length(all_graphs[n],all_roots[n]).values()))

In [8]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(max_values, bins = 20)


Out[8]:
(array([  1.20530000e+04,   4.50200000e+03,   2.15600000e+03,
          7.93000000e+02,   3.24000000e+02,   4.80000000e+01,
          5.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00]),
 array([  1.  ,   3.15,   5.3 ,   7.45,   9.6 ,  11.75,  13.9 ,  16.05,
         18.2 ,  20.35,  22.5 ,  24.65,  26.8 ,  28.95,  31.1 ,  33.25,
         35.4 ,  37.55,  39.7 ,  41.85,  44.  ]),
 <a list of 20 Patch objects>)

Which node is huge

This is a fairly good result. Most of the nodes have few levels. However, some of these nodes are really big, 5 have more than 13 levels, and one seems to have over 44. Lets tak a look at the worst case we got


In [ ]:


In [9]:
sorted_indices = [i[0] for i in sorted(enumerate(max_values), key=lambda x:x[1])]

In [10]:
max_values[sorted_indices[-1]]
ast.dump(all_cells[sorted_indices[-1]].get_feature('ast'))


Out[10]:
'Module(body=[FunctionDef(name=\'computTheCorpus\', args=arguments(args=[arg(arg=\'data_input\', annotation=None), arg(arg=\'nbOfCorpus\', annotation=None)], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Assign(targets=[Name(id=\'data_text\', ctx=Store())], value=Call(func=Attribute(value=Subscript(value=Name(id=\'data_input\', ctx=Load()), slice=Index(value=Str(s=\'text\')), ctx=Load()), attr=\'dropna\', ctx=Load()), args=[], keywords=[])), Assign(targets=[Name(id=\'data_text\', ctx=Store())], value=Call(func=Attribute(value=Name(id=\'data_text\', ctx=Load()), attr=\'apply\', ctx=Load()), args=[Lambda(args=arguments(args=[arg(arg=\'x\', annotation=None)], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=Call(func=Attribute(value=Name(id=\'x\', ctx=Load()), attr=\'lower\', ctx=Load()), args=[], keywords=[]))], keywords=[])), Assign(targets=[Name(id=\'data_text\', ctx=Store())], value=Call(func=Attribute(value=Name(id=\'data_text\', ctx=Load()), attr=\'apply\', ctx=Load()), args=[Lambda(args=arguments(args=[arg(arg=\'x\', annotation=None)], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Call(func=Attribute(value=Name(id=\'x\', ctx=Load()), attr=\'replace\', ctx=Load()), args=[Str(s=\':\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'—\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'-\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'.\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\',\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'.\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'<\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'>\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'=\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'•\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'\\\\\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'\\n\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'^\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'\\\\\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'!\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'?\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'"\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'/\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s=\'@\'), Str(s=\' \')], keywords=[]), attr=\'replace\', ctx=Load()), args=[Str(s="\'"), Str(s=\' \')], keywords=[]))], keywords=[])), Assign(targets=[Name(id=\'texts\', ctx=Store())], value=ListComp(elt=ListComp(elt=Name(id=\'word\', ctx=Load()), generators=[comprehension(target=Name(id=\'word\', ctx=Store()), iter=Call(func=Attribute(value=Call(func=Attribute(value=Name(id=\'document\', ctx=Load()), attr=\'lower\', ctx=Load()), args=[], keywords=[]), attr=\'split\', ctx=Load()), args=[], keywords=[]), ifs=[BoolOp(op=And(), values=[Compare(left=Call(func=Name(id=\'len\', ctx=Load()), args=[Name(id=\'word\', ctx=Load())], keywords=[]), ops=[Gt()], comparators=[Num(n=1)]), Call(func=Attribute(value=Name(id=\'word\', ctx=Load()), attr=\'startswith\', ctx=Load()), args=[Str(s=\'#\')], keywords=[])])], is_async=0)]), generators=[comprehension(target=Name(id=\'document\', ctx=Store()), iter=Name(id=\'data_text\', ctx=Load()), ifs=[], is_async=0)])), Assign(targets=[Name(id=\'frequency\', ctx=Store())], value=Call(func=Name(id=\'defaultdict\', ctx=Load()), args=[Name(id=\'int\', ctx=Load())], keywords=[])), For(target=Name(id=\'text\', ctx=Store()), iter=Name(id=\'texts\', ctx=Load()), body=[For(target=Name(id=\'token\', ctx=Store()), iter=Name(id=\'text\', ctx=Load()), body=[AugAssign(target=Subscript(value=Name(id=\'frequency\', ctx=Load()), slice=Index(value=Name(id=\'token\', ctx=Load())), ctx=Store()), op=Add(), value=Num(n=1))], orelse=[])], orelse=[]), Assign(targets=[Name(id=\'texts\', ctx=Store())], value=ListComp(elt=ListComp(elt=Name(id=\'token\', ctx=Load()), generators=[comprehension(target=Name(id=\'token\', ctx=Store()), iter=Name(id=\'text\', ctx=Load()), ifs=[Compare(left=Subscript(value=Name(id=\'frequency\', ctx=Load()), slice=Index(value=Name(id=\'token\', ctx=Load())), ctx=Load()), ops=[Gt()], comparators=[Num(n=1)])], is_async=0)]), generators=[comprehension(target=Name(id=\'text\', ctx=Store()), iter=Name(id=\'texts\', ctx=Load()), ifs=[], is_async=0)])), Assign(targets=[Name(id=\'corpus\', ctx=Store())], value=ListComp(elt=Call(func=Attribute(value=Name(id=\'dictionary\', ctx=Load()), attr=\'doc2bow\', ctx=Load()), args=[Name(id=\'text\', ctx=Load())], keywords=[]), generators=[comprehension(target=Name(id=\'text\', ctx=Store()), iter=Name(id=\'texts\', ctx=Load()), ifs=[], is_async=0)])), Assign(targets=[Name(id=\'model\', ctx=Store())], value=Call(func=Attribute(value=Name(id=\'models\', ctx=Load()), attr=\'LdaModel\', ctx=Load()), args=[Name(id=\'corpus\', ctx=Load())], keywords=[keyword(arg=\'id2word\', value=Name(id=\'dictionary\', ctx=Load())), keyword(arg=\'num_topics\', value=Name(id=\'nbOfCorpus\', ctx=Load()))])), Return(value=Name(id=\'model\', ctx=Load()))], decorator_list=[], returns=None)])'

In [11]:
# Code that created it
print (all_cells[sorted_indices[-1]].get_feature('original_code'))


# coding: utf-8

# In[ ]:

def computTheCorpus(data_input, nbOfCorpus):
    
    #first remove the  Nan value if any
    data_text = data_input['text'].dropna()
    
    #For cleaning text I first transfom all characters into lowercase After that I remove , . 
    #and other punctuactions signs by an espace.
    data_text = data_text.apply(lambda x: x.lower())
    data_text = data_text.apply(lambda x: x                                .replace(':',' ')                                .replace('—',' ')                                .replace('-',' ')                                .replace('.',' ')                                .replace(',',' ')                                .replace('.',' ')                                .replace('<',' ')                                .replace('>',' ')                                .replace('=',' ')                                .replace('•',' ')                                .replace("\\",' ')                                .replace('\n', ' ')                                .replace('^',' ')                                .replace('\\',' ')                                .replace('!',' ')                                .replace('?',' ')                                .replace('"',' ')                                .replace('/',' ')                                .replace('@',' ')                                .replace('\'',' ')                               )
    
    ## I don't need stop list because I will just use the word that begin with # (because on twitter hastag represent 
    ## "content" of tweet) (code inspired from homework but modified)
    
    texts = [[word for word in document.lower().split() if len(word) > 1 and word.startswith('#')]
         for document in data_text]
    
    # counts the number of appartion of a word
    frequency = defaultdict(int)
    for text in texts:
        for token in text:
            frequency[token] += 1

    # remove word that apprears only once
    texts = [[token for token in text if frequency[token] > 1]
          for text in texts] 
    
    corpus = [dictionary.doc2bow(text) for text in texts]
    
    model = models.LdaModel(corpus, id2word=dictionary, num_topics=nbOfCorpus)
    
    
    
    return model


Which nodes are big

Ok, that piece of code is kind of ridiculous, how about some of the smaller big asts.


In [12]:
print (all_cells[sorted_indices[-2]].get_feature('original_code'))


# coding: utf-8

# In[ ]:

epfl_clean[epfl_clean['retweet_count'] != 6095].groupby('year').sum()['retweet_count'].plot(kind='bar').set_xlabel('EPFL')



In [20]:
print (ast.dump(all_cells[sorted_indices[-3]].get_feature('ast')))


Module(body=[FunctionDef(name='plotGraph', args=arguments(args=[arg(arg='i', annotation=None), arg(arg='j', annotation=None)], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='grouped', ctx=Store())], value=Subscript(value=Name(id='grouped_all', ctx=Load()), slice=Index(value=Name(id='i', ctx=Load())), ctx=Load())), Assign(targets=[Name(id='aggregated', ctx=Store())], value=Subscript(value=Name(id='aggregated_all', ctx=Load()), slice=Index(value=Name(id='j', ctx=Load())), ctx=Load())), Assign(targets=[Name(id='groups', ctx=Store())], value=List(elts=[Name(id='grouped', ctx=Load()), Name(id='aggregated', ctx=Load())], ctx=Load())), Assign(targets=[Name(id='x_var', ctx=Store())], value=Subscript(value=Name(id='x_vars', ctx=Load()), slice=Index(value=Name(id='i', ctx=Load())), ctx=Load())), Assign(targets=[Name(id='ax', ctx=Store())], value=Call(func=Attribute(value=Name(id='fig', ctx=Load()), attr='add_subplot', ctx=Load()), args=[Num(n=3), Num(n=2), BinOp(left=BinOp(left=BinOp(left=Name(id='i', ctx=Load()), op=Mult(), right=Num(n=2)), op=Add(), right=Name(id='j', ctx=Load())), op=Add(), right=Num(n=1))], keywords=[])), For(target=Name(id='axis', ctx=Store()), iter=List(elts=[Attribute(value=Name(id='ax', ctx=Load()), attr='xaxis', ctx=Load()), Attribute(value=Name(id='ax', ctx=Load()), attr='yaxis', ctx=Load())], ctx=Load()), body=[Expr(value=Call(func=Attribute(value=Call(func=Attribute(value=Name(id='axis', ctx=Load()), attr='get_major_formatter', ctx=Load()), args=[], keywords=[]), attr='set_useOffset', ctx=Load()), args=[NameConstant(value=False)], keywords=[]))], orelse=[]), If(test=Compare(left=Name(id='i', ctx=Load()), ops=[NotEq()], comparators=[Num(n=2)]), body=[Expr(value=Call(func=Attribute(value=Name(id='plt', ctx=Load()), attr='plot', ctx=Load()), args=[Name(id='x_var', ctx=Load()), Call(func=Attribute(value=Call(func=Attribute(value=Subscript(value=Name(id='df_epfl_2', ctx=Load()), slice=Index(value=Name(id='groups', ctx=Load())), ctx=Load()), attr='groupby', ctx=Load()), args=[Name(id='grouped', ctx=Load())], keywords=[]), attr='agg', ctx=Load()), args=[Name(id='sum', ctx=Load())], keywords=[]), Str(s='bo')], keywords=[keyword(arg='label', value=Str(s='epfl'))])), Expr(value=Call(func=Attribute(value=Name(id='plt', ctx=Load()), attr='plot', ctx=Load()), args=[Name(id='x_var', ctx=Load()), Call(func=Attribute(value=Call(func=Attribute(value=Subscript(value=Name(id='df_eth_2', ctx=Load()), slice=Index(value=Name(id='groups', ctx=Load())), ctx=Load()), attr='groupby', ctx=Load()), args=[Name(id='grouped', ctx=Load())], keywords=[]), attr='agg', ctx=Load()), args=[Name(id='sum', ctx=Load())], keywords=[]), Str(s='ro')], keywords=[keyword(arg='label', value=Str(s='eth'))])), Expr(value=Call(func=Attribute(value=Name(id='plt', ctx=Load()), attr='legend', ctx=Load()), args=[], keywords=[keyword(arg='loc', value=Str(s='upper left'))]))], orelse=[]), If(test=Compare(left=Name(id='i', ctx=Load()), ops=[Eq()], comparators=[Num(n=2)]), body=[Expr(value=Call(func=Attribute(value=Name(id='plt', ctx=Load()), attr='plot', ctx=Load()), args=[Name(id='x_var', ctx=Load()), Call(func=Attribute(value=Subscript(value=BinOp(left=Name(id='hours_df', ctx=Load()), op=Add(), right=Call(func=Attribute(value=Call(func=Attribute(value=Subscript(value=Name(id='df_epfl_2', ctx=Load()), slice=Index(value=Name(id='groups', ctx=Load())), ctx=Load()), attr='groupby', ctx=Load()), args=[Name(id='grouped', ctx=Load())], keywords=[]), attr='agg', ctx=Load()), args=[Name(id='sum', ctx=Load())], keywords=[])), slice=Index(value=Name(id='aggregated', ctx=Load())), ctx=Load()), attr='fillna', ctx=Load()), args=[Num(n=0)], keywords=[]), Str(s='bo')], keywords=[keyword(arg='label', value=Str(s='epfl'))])), Expr(value=Call(func=Attribute(value=Name(id='plt', ctx=Load()), attr='plot', ctx=Load()), args=[Name(id='x_var', ctx=Load()), Call(func=Attribute(value=Subscript(value=BinOp(left=Name(id='hours_df', ctx=Load()), op=Add(), right=Call(func=Attribute(value=Call(func=Attribute(value=Subscript(value=Name(id='df_eth_2', ctx=Load()), slice=Index(value=Name(id='groups', ctx=Load())), ctx=Load()), attr='groupby', ctx=Load()), args=[Name(id='grouped', ctx=Load())], keywords=[]), attr='agg', ctx=Load()), args=[Name(id='sum', ctx=Load())], keywords=[])), slice=Index(value=Name(id='aggregated', ctx=Load())), ctx=Load()), attr='fillna', ctx=Load()), args=[Num(n=0)], keywords=[]), Str(s='ro')], keywords=[keyword(arg='label', value=Str(s='eth'))])), Expr(value=Call(func=Attribute(value=Name(id='plt', ctx=Load()), attr='legend', ctx=Load()), args=[], keywords=[keyword(arg='loc', value=Str(s='upper left'))]))], orelse=[]), Expr(value=Call(func=Attribute(value=Name(id='plt', ctx=Load()), attr='title', ctx=Load()), args=[Call(func=Attribute(value=Str(s='{0}, {1}'), attr='format', ctx=Load()), args=[Name(id='aggregated', ctx=Load()), Name(id='grouped', ctx=Load())], keywords=[])], keywords=[]))], decorator_list=[], returns=None)])

In [14]:
print (all_cells[sorted_indices[-4]].get_feature('original_code'))


# coding: utf-8

# In[ ]:

deth[deth['hour'] == 12]['retweet_count'].copy().sort_values().plot.bar()



In [15]:
print (all_cells[sorted_indices[-5]].get_feature('original_code'))


# coding: utf-8

# In[ ]:

# Here we normalize the text, the code is taken from 
#https://github.com/heerme/twitter-topics/blob/master/twitter-topics-from-json-text-stream.py
def normalize_text(text):
    if type(text) is not str:
        print(text)
    text = re.sub('((www\.[^\s]+)|(https?://[^\s]+)|(pic\.twitter\.com/[^\s]+))','', text)
    text = re.sub('rt','',text)
    text = re.sub('RT','',text)
    
    text = re.sub('@[^\s]+','', text)
    text = re.sub('#([^\s]+)', '', text)
    text = re.sub('[:;>?<=*+()/,\-#!$%\{˜|\}\[^_\\@\]1234567890’‘]',' ', text)
    text = re.sub('[\d]','', text)
    text = text.replace(".", '')
    text = text.replace("'", ' ')
    text = text.replace("\"", ' ')
    #text = text.replace("-", " ")
    #normalize some utf8 encoding
    text = text.replace("\x9d",' ').replace("\x8c",' ')
    text = text.replace("\xa0",' ')
    text = text.replace("\x9d\x92", ' ').replace("\x9a\xaa\xf0\x9f\x94\xb5", ' ').replace("\xf0\x9f\x91\x8d\x87\xba\xf0\x9f\x87\xb8", ' ').replace("\x9f",' ').replace("\x91\x8d",' ')
    text = text.replace("\xf0\x9f\x87\xba\xf0\x9f\x87\xb8",' ').replace("\xf0",' ').replace('\xf0x9f','').replace("\x9f\x91\x8d",' ').replace("\x87\xba\x87\xb8",' ')	
    text = text.replace("\xe2\x80\x94",' ').replace("\x9d\xa4",' ').replace("\x96\x91",' ').replace("\xe1\x91\xac\xc9\x8c\xce\x90\xc8\xbb\xef\xbb\x89\xd4\xbc\xef\xbb\x89\xc5\xa0\xc5\xa0\xc2\xb8",' ')
    text = text.replace("\xe2\x80\x99s", " ").replace("\xe2\x80\x98", ' ').replace("\xe2\x80\x99", ' ').replace("\xe2\x80\x9c", " ").replace("\xe2\x80\x9d", " ")
    text = text.replace("\xe2\x82\xac", " ").replace("\xc2\xa3", " ").replace("\xc2\xa0", " ").replace("\xc2\xab", " ").replace("\xf0\x9f\x94\xb4", " ").replace("\xf0\x9f\x87\xba\xf0\x9f\x87\xb8\xf0\x9f", "")
    return text


Exploring the immediate children

In order to do bottom up, we need to look at the leaf nodes and their parents. This is made simpler by the order that can be found for certain node types. Descriptions of each node type can be found at http://greentreesnakes.readthedocs.io/en/latest/nodes.html

Many nodes will always have the same number of children. However, some nodes have variable numbers -- Assign is a good example of this. All three of the below are valid:

  • x = 1
  • x, y = 1, 2
  • x, y, z = 1, 2, 3

Let's look at both the size and form of a nodes children to see if we can come up with a first pass at the bottom up approach.


In [16]:
len(all_graphs), len(all_roots)


Out[16]:
(19882, 19882)

In [17]:
def traverse_graph(g, cur_d):
    for node in networkx.dfs_preorder_nodes(g):
        t_node = type(node)
        if t_node not in cur_d:
            cur_d[t_node] = []
        child_set = set()
        for child in g[node]:
            child_set.add(type(child))
        cur_d[t_node].append(child_set)
    return cur_d
my_dict = {}
for g in all_graphs:
    my_dict = traverse_graph(g, my_dict)

In [18]:
import numpy as np
for key in my_dict.keys():
    print (key, np.unique(np.array([len(s) for s in my_dict[key]])), len(my_dict[key]))


<class '_ast.Import'> [1] 1150
<class '_ast.alias'> [0] 2350
<class '_ast.ImportFrom'> [1] 1010
<class '_ast.Expr'> [1] 9587
<class '_ast.Call'> [1 2 3 4 5] 22610
<class '_ast.Attribute'> [2] 24188
<class '_ast.Name'> [1] 53355
<class '_ast.Load'> [0] 17407
<class '_ast.Str'> [0] 19824
<class '_ast.Assign'> [1 2] 10985
<class '_ast.Store'> [0] 9269
<class '_ast.Num'> [0] 7477
<class '_ast.Subscript'> [3] 9562
<class '_ast.Index'> [1] 9232
<class '_ast.Compare'> [2 3] 1291
<class '_ast.BinOp'> [2 3] 1447
<class '_ast.Mod'> [0] 400
<class '_ast.Eq'> [0] 597
<class '_ast.For'> [2 3 4 5 6] 904
<class '_ast.List'> [1 2 3 4] 2607
<class '_ast.keyword'> [1] 6269
<class '_ast.Add'> [0] 399
<class '_ast.Lambda'> [2] 957
<class '_ast.arguments'> [0 1 2 3 4 5] 1657
<class '_ast.arg'> [0] 2045
<class '_ast.If'> [2 3 4] 282
<class '_ast.NotEq'> [0] 97
<class '_ast.Slice'> [0 1 2] 442
<class '_ast.UnaryOp'> [2] 193
<class '_ast.USub'> [0] 114
<class '_ast.NameConstant'> [0] 1069
<class '_ast.ListComp'> [2] 819
<class '_ast.comprehension'> [1 2 3] 886
<class '_ast.Dict'> [0 1 2 3 4 5] 292
<class '_ast.Gt'> [0] 194
<class '_ast.FunctionDef'> [2 3 4 5 6] 700
<class '_ast.Return'> [0 1] 565
<class '_ast.Tuple'> [1 2 3] 1109
<class '_ast.BoolOp'> [2 3] 66
<class '_ast.Or'> [0] 15
<class '_ast.In'> [0] 92
<class '_ast.NotIn'> [0] 125
<class '_ast.Break'> [0] 1
<class '_ast.AugAssign'> [2 3] 132
<class '_ast.GeneratorExp'> [2] 28
<class '_ast.Is'> [0] 6
<class '_ast.Not'> [0] 36
<class '_ast.Set'> [1] 12
<class '_ast.Mult'> [0] 102
<class '_ast.With'> [2 3] 43
<class '_ast.withitem'> [2] 43
<class '_ast.Div'> [0] 163
<class '_ast.And'> [0] 50
<class '_ast.Sub'> [0] 124
<class '_ast.ExtSlice'> [1 2] 112
<class '_ast.IfExp'> [2 3] 32
<class '_ast.Delete'> [1] 269
<class '_ast.Del'> [0] 268
<class '_ast.Try'> [2 3 4] 18
<class '_ast.Yield'> [0 1] 4
<class '_ast.GtE'> [0] 19
<class '_ast.BitAnd'> [0] 20
<class '_ast.DictComp'> [2 3] 17
<class '_ast.While'> [2] 2
<class '_ast.Lt'> [0] 50
<class '_ast.ExceptHandler'> [1 2] 17
<class '_ast.Pow'> [0] 39
<class '_ast.Invert'> [0] 28
<class '_ast.Continue'> [0] 9
<class '_ast.LtE'> [0] 11
<class '_ast.SetComp'> [2] 10
<class '_ast.IsNot'> [0] 7
<class '_ast.BitOr'> [0] 14
<class '_ast.Starred'> [2] 13
<class '_ast.FloorDiv'> [0] 1
<class '_ast.ClassDef'> [2] 1
<class '_ast.BitXor'> [0] 1

In [ ]: